Intelligent Block Placement Strategy in Heterogeneous Hadoop Clusters

نویسندگان

  • Lili Sun
  • Yang Yang
  • Zenggang Xiong
  • Xiaoyong Zhao
چکیده

MapReduce is an important distributed processing model for large-scale data-intensive applications. As an open-source implementation of MapReduce, Hadoop provides enterprises with a cost-efficient solution for their analytics needs. However, the default HDFS block placement policy assumes that computing nodes in a cluster are homogeneous, and tries to balance load by placing blocks randomly, which is insufficient to address the system’s self-adaptability. In this paper, we propose a partition-based hierarchical architecture for Hadoop. Based on this architecture, we take the impact of resource characteristics, e.g. disk space utilization and computing capacity of each node into consideration, and propose an intelligent block placement strategy. In this strategy, three mechanisms are given to guarantee the reliability and scalability of Hadoop. Experiments are conducted to show that the proposed strategies not only make sure the load balancing in the whole cluster and minimize the total data moved, but also significantly improve the performance of MapReduce.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

An Improved Data Placement Strategy in a Heterogeneous Hadoop Cluster

Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we ...

متن کامل

An Optimal Task Assignment Policy and Performance Diagnosis Strategy for Heterogeneous Hadoop Cluster

The goal of the proposed research is to improve the performance of Hadoop-based software running on a heterogeneous cluster. My approach lies in the intersection of machine learning, scheduling and diagnosis. We mainly focus on heterogeneous Hadoop clusters and try to improve the performance by implementing a more efficient scheduler for this class of cluster.

متن کامل

Hadoop Block Placement Policy for Different File Formats

Now a day’s Peta-Bytes of data becomes the norm in industries. Handling, analyzing such big data is challenging task. Even frameworks like Hadoop (Open Source Implementation of MapReduce Paradigm) and NoSQL databases like Cassandra, HBase can be used to analyze and store such large data; heterogeneity of data is still an issue. Data centers usually have clusters formed using heterogeneous nodes...

متن کامل

Performance Improvement of Map Reduce through Enhancement in Hadoop Block Placement Algorithm

In last few years, a huge volume of data has been produced from multiple sources across the globe. Dealing with such a huge volume of data has arisen the so called “Big data problem”, which can be solved only with new computing paradigms and platforms which lead to Apache Hadoop to come into picture. Inspired by the Google’s private cluster platform, few independent software developers develope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013